Hi, I am trying to read string which have some data as below: good Good good better good excellent good Can you please let me know how to read it one by one and count the max repeated word? I tried utilizing StringIO and string for reading it line by line and counter for counting, but failing to do so.
Regards Anil
You must be logged in to post. Please login or register an account.
You can use the counter module to count words. To read line by line, just take your string, and do .split() using a backslash n as what you split by.
-Harrison 9 years ago
Last edited 9 years ago
You must be logged in to post. Please login or register an account.
Hi Harrison,
It is reading only the first line in the string. Can u please help me on this. The code which I have written is as below..
import nltk #import re #from nltk.draw.tree import draw_trees import string import io import StringIO import counter from nltk.corpus import state_union from nltk.tokenize import word_tokenize from nltk.tokenize import PunktSentenceTokenizer from nltk.tag import pos_tag text=state_union.raw("/home/hduser/mail_file") cust_tonizer=PunktSentenceTokenizer(text) tokenized=cust_tonizer.tokenize(text) words=word_tokenize(text) tags=pos_tag(words) for t in tags: if t[1] == "JJ" or t[1] == "JJR": data = t[0] #matches=value.append(t[0]) #fdist = nltk.FreqDist(value) #most_common = fdist.max() #top_three = fdist.keys()[:3] print data #print top_three else: continue
val=data.split('n') counter={} for line in val: print line counter[line] = counter.get(line, 0) + 1 cnt=sorted([ (freq,word) for word, freq in counter.items() ], reverse=True)[:3] print cnt
-Anilt 9 years ago
Last edited 9 years ago
You must be logged in to post. Please login or register an account.
Hello, my name is Mike and I am new to this forum. I have been learning Python for a few months but I think I could answer the question in this thread.
This is how I would do the task:
fhand = open('results.txt') # I assume the results (good, better, etc.) are in a text file marks = dict() # Empty dictionary, will include the frequency of all results
for line in fhand: # Going through the marks in the text file (all words are lowercased) if len(line) > 1: line = line.lower().strip() marks[line] = marks.get(line, 0) + 1
max_value = 0 # This pair of variables will store the most frequent word max_key = '' # '' is an empty string
for key, value in marks.items(): if value > max_value: max_value = value max_key = key
print(max_key, max_value) #Hope this is what you are looking for. fhand.close()
Please let me know if the code doesn't work properly.
Here is the link to the code (PNG file): https://goo.gl/YDr8M6
Mike
-mnalevanko 9 years ago
Last edited 9 years ago
You must be logged in to post. Please login or register an account.
Thanks for sharing Mike! I went ahead and wrote up a way for people to use tags: [code] [/code] to make code much easier to read here.
-Harrison 9 years ago
Last edited 9 years ago
You must be logged in to post. Please login or register an account.